Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
在这项工作中,我们证明了多种语的大规模序列到序列(SEQ2SEQ)模型,该模型是通过Denoising和因果语言建模(CLM)任务的混合物进行训练的,比仅解码器模型更有效地进行了效率的学习者在各种任务上。特别是,我们培训了一个名为Alexa教师模型(Alexatm 20b)的200亿个参数多语言SEQ2SEQ模型,并表明它在1-Shot摘要任务上实现了最先进的(SOTA)性能,超过了更大的540B PALM DOPODER模型。 Alexatm 20b还可以在1-Shot Machine翻译中实现SOTA,尤其是对于低资源语言,几乎所有语言对(阿拉伯语,英语,法语,德语,德语,印地语,意大利语,日语,以及flores-101数据集上的泰卢固语)。我们还显示了零拍设置,AlexATM 20B在SuperGlue和SqueadV2数据集上的表现优于GPT3(175B),并在XNLI,XCOPA,PAWS-X和XWINOGRAD等多语言任务上提供SOTA性能。总体而言,我们的结果为SEQ2SEQ模型提供了一个令人信服的案例,作为大型语言模型(LLM)培训的仅解码器模型的强大替代方法。
translated by 谷歌翻译
我们介绍了一种新颖的方法,用于使用时间戳监督进行时间戳分割。我们的主要贡献是图形卷积网络,该网络以端到端方式学习,以利用相邻帧之间的帧功能和连接,以从稀疏的时间戳标签中生成密集的框架标签。然后可以使用生成的密集框架标签来训练分割模型。此外,我们为分割模型和图形卷积模型进行交替学习的框架,该模型首先初始化,然后迭代地完善学习模型。在四个公共数据集上进行了详细的实验,包括50种沙拉,GTEA,早餐和桌面组件,表明我们的方法优于多层感知器基线,同时在时间活动中表现出色或更好地表现出色或更好在时间戳监督下。
translated by 谷歌翻译
我们介绍了一个大规模实验,该实验对编码器进行了预处理,其参数计数范围从700m到9.3b不等,随后蒸馏到较小的型号中,范围为17m-170亿参数,其应用到自然语言理解(NLU)组件(NLU)组件(虚拟助手系统。尽管我们使用70%的口语数据训练,但在对书面形式的跨语性自然语言推论(XNLI)语料库进行评估时,我们的教师模型与XLM-R和MT5相当。我们使用系统中的内域数据对教师模型进行了第二阶段的训练,以提高了3.86%的相对分类,而相对7.01%的插槽填充。我们发现,即使是从我们的2阶段教师模型中提取的170亿参数模型,与仅接受公共数据的2.3B参数老师相比,与2.3B参数老师相比,意图分类更好2.88%,并且7.69%的插槽填充错误率更好(第1阶段),强调了。内域数据对训练的重要性。当使用标记的NLU数据进行离线评估时,我们的17m参数阶段2蒸馏模型的表现分别优于XLM-R碱基(85m Params)和Distillbert(42m Params),分别优于4.23%至6.14%。最后,我们介绍了一个完整的虚拟助手实验平台的结果,在该平台中,我们发现使用经过预训练和蒸馏管道训练的模型超过了从8500万参数教师蒸馏的模型,在自动测量全系统用户不满的自动测量中,从8500万参数教师蒸馏出3.74%-4.91%。
translated by 谷歌翻译
面部及其表达是数字图像的有效科目之一。检测图像的情绪是计算机视野领域的古代任务;然而,从图像进行反向合成的面部表达式 - 是非常新的。使用不同面部表情的再生图像的这种操作,或者改变图像中的现有表达需要生成的对抗网络(GaN)。在本文中,我们的目标是使用GaN改变图像中的面部表情,其中具有初始表达式(即,快乐)的输入图像被改变为同一个人的不同表达式(即,厌恶)。我们在Mug数据集的修改版本上使用了Stargn技术来完成此目标。此外,我们通过在从给定文本中的情感指示的图像中重塑面部表情进一步扩展我们的工作。因此,我们应用了一个长期的短期内存(LSTM)方法来从文本中提取情绪并将其转发给我们的表达式更改模块。作为我们的工作管道的演示,我们还创建了一个博客的应用程序原型,该博客将根据用户的文本情绪与不同的表达式重新生成配置文件图片。
translated by 谷歌翻译
This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of location information from the text. For RCTP, we proposed four different solutions based on BERT, RoBERTa, Distil BERT, and ALBERT obtaining an F1-score of 0.7934, 0.7970, 0.7613, and 0.7924, respectively. For LETT, we used three models namely BERT, RoBERTa, and Distil BERTA obtaining an F1-score of 0.6256, 0.6744, and 0.6723, respectively.
translated by 谷歌翻译
The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.
translated by 谷歌翻译
Adversarial training is an effective approach to make deep neural networks robust against adversarial attacks. Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD. High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as `gradient masking'. In this work, we analyse the effect of label smoothing on adversarial training as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA). Our attack approach is based on a `match and deceive' loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate for the case of Auto-Attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
translated by 谷歌翻译
Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a blend of artifacts such as reverberation, sensor noise, and background audio mixture with varying types, severities, and duration. In this study, we propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) with temporal and spectral objective metrics to enhance the quality of restored audio signal regardless of the type and severity of each artifact corrupting it. Methods: 1D Operational-GANs are used with generative neuron model optimized for blind restoration of any corrupted audio signal. Results: The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with a random blend of artifacts each with a random severity to mimic real-world audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods. Significance: This is a pioneer study in blind audio restoration with the unique capability of direct (time-domain) restoration of real-world audio whilst achieving an unprecedented level of performance for a wide SDR range and artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally effective real-world audio restoration with significantly improved performance. The source codes and the generated real-world audio datasets are shared publicly with the research community in a dedicated GitHub repository1.
translated by 谷歌翻译
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN's latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high-quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements. Code: https://github.com/hamzapehlivan/StyleRes
translated by 谷歌翻译